Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.

In [1]:
checkpoint1_var = {}
checkpoint1_file = 'data/checkpoint1'
checkpoint1_force = False

Step 0: Load The Data

In [2]:
# Load pickled data
import pickle

# TODO: Fill this in based on where you saved the training and testing data

training_file = 'traffic-signs-data/train.p'
testing_file = 'traffic-signs-data/test.p'

with open(training_file, mode='rb') as f:
    train_provided = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test_provided = pickle.load(f)
    
X_train_provided, y_train_provided = train_provided['features'], train_provided['labels']
X_test_provided, y_test_provided = test_provided['features'], test_provided['labels']

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Complete the basic data summary below.

In [3]:
### Replace each question mark with the appropriate value.
import pickle

# TODO: Number of training examples
n_train_provided = len(X_train_provided)

# TODO: Number of testing examples.
n_test_provided = len(X_test_provided)

# TODO: What's the shape of an traffic sign image?
image_shape = X_train_provided.shape[1:]

# TODO: How many unique classes/labels there are in the dataset.
n_classes = len(set(y_train_provided))

print("Number of training examples =", n_train_provided)
print("Number of testing examples =", n_test_provided)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)

# Storing variables for checkpoint
checkpoint1_var['n_classes'] = n_classes
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.

In [4]:
### Creating Label Dictionary
import csv

label_dict = None
with open('signnames.csv', mode='r') as infile:
    reader = csv.reader(infile)
    next(reader, None)
    label_dict = {int(rows[0]):rows[1] for rows in reader}

# Storing for checkpoint1
checkpoint1_var['label_dict'] = label_dict
In [5]:
### Data exploration visualization goes here

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Visualizing Data for Each Label
fig = plt.figure(figsize=(32,32), tight_layout={'h_pad':4})
for i in range(n_classes):
    k = np.where(y_train_provided==i)
    img = X_train_provided[k[0][0]]
    ax = fig.add_subplot(int(n_classes/4)+1,4,i+1) 
    ax.imshow(img, interpolation='none')
    ax.set_title(label_dict[y_train_provided[k[0][0]]])
plt.show()
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/matplotlib/figure.py:1742: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
  warnings.warn("This figure includes Axes that are not "
In [6]:
### Label Distribution in Training Data
import pandas as pd

y_train_df = pd.DataFrame()
y_train_df['label'] = y_train_provided
ax = y_train_df['label'].value_counts().plot(kind='barh', figsize = (10,10), title='Samples per Class')
ax.set_yticklabels(list(map(lambda x: label_dict[x], y_train_df['label'].value_counts().index.tolist())))            
for i, v in enumerate(y_train_df['label'].value_counts()):
    ax.text(v + 10, i - 0.25, str(v), color='blue')

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [7]:
### Preprocess the data here.
import os
import warnings
from tqdm import tqdm
from skimage import exposure

### Histogram Equilization
def normalized_image(image_data):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return np.array([exposure.equalize_adapthist(image, clip_limit=0.03) for image in tqdm(image_data)])

if not os.path.isfile(checkpoint1_file) or checkpoint1_force:
    print('Normalizing Training Images...')
    X_train_provided_normalized = normalized_image(X_train_provided)
    
    print('\nNormalizing Test Images...')
    X_test_provided_normalized = normalized_image(X_test_provided)
    
    checkpoint1_var['X_train_provided_normalized'] = X_train_provided_normalized
    checkpoint1_var['y_train_provided'] = y_train_provided
    checkpoint1_var['X_test_provided_normalized'] = X_test_provided_normalized
    checkpoint1_var['y_test_provided'] = y_test_provided

else:
    print('Checkpoint Already Exists. Data would be loaded from checkpoint. Set checkpoint1_force to force new file.')
  0%|          | 7/39209 [00:00<10:18, 63.43it/s]
Normalizing Training Images...
100%|██████████| 39209/39209 [10:33<00:00, 61.87it/s]
  0%|          | 7/12630 [00:00<03:22, 62.30it/s]
Normalizing Test Images...
100%|██████████| 12630/12630 [03:28<00:00, 60.49it/s]
In [10]:
### Storing Varibales for Checkpoint1
import pickle

if not os.path.isfile(checkpoint1_file) or checkpoint1_force:
    print('Caching variables in pickle file...')
    try:
        with open(checkpoint1_file, 'wb') as pfile:
            pickle.dump(checkpoint1_var, pfile, pickle.HIGHEST_PROTOCOL)
    except Exception as e:
        print('Unable to save data to', checkpoint1_file, ':', e)
        raise
    print('Data cached in pickle file.')
else:
    print('Pickle file already created. Set checkpoint1_force to force new file.')
Caching variables in pickle file...
Data cached in pickle file.

Question 1

Describe how you preprocessed the data. Why did you choose that technique?

Answer:
Normalized image using adaptive histogram equalization for the following reasons:

  • Normalize the pixel values in the range of 0-1 which would help with gradient decent.
  • Reduce the impact of varying contrast and illumination in images and thus make the model more robust to contrast and illumination.

------------------ Checkpoint 1 ------------------

In [1]:
import pickle 
checkpoint1_file = 'data/checkpoint1'

with open(checkpoint1_file, 'rb') as pfile:
    pickle_data = pickle.load(pfile)
    for key in pickle_data:
        exec(key + "= pickle_data['" + str(key) + "']")
    del pickle_data
    
checkpoint2_var = {}
checkpoint2_file = 'data/checkpoint2'
checkpoint2_force = False
In [2]:
### Generate data additional data (OPTIONAL!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
In [3]:
from sklearn.model_selection import train_test_split

## Splitiing Training Data in Training, Validation, Test
X_train, X_val_test, y_train, y_val_test = train_test_split(
    X_train_provided_normalized,
    y_train_provided,
    test_size=0.30,
    random_state=9345,
    stratify=y_train_provided
)

X_val, X_test, y_val, y_test = train_test_split(
    X_val_test,
    y_val_test,
    test_size=0.30,
    random_state=9345,
    stratify=y_val_test
)

del X_val_test
del y_val_test

checkpoint2_var['X_train'] = X_train.astype('float32')
checkpoint2_var['y_train'] = y_train
checkpoint2_var['X_val'] = X_val.astype('float32')
checkpoint2_var['y_val'] = y_val
checkpoint2_var['X_test'] = X_test.astype('float32')
checkpoint2_var['y_test'] = y_test
In [4]:
import keras
from keras.preprocessing.image import ImageDataGenerator

datagen = keras.preprocessing.image.ImageDataGenerator(
    rotation_range=8,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    fill_mode='nearest',
    horizontal_flip=False,
    vertical_flip=False
)
Using TensorFlow backend.
In [6]:
import os
import numpy as np
from tqdm import tqdm
from sklearn.utils import shuffle

total_image_per_class = 5000
X_train_augmented = np.empty((0,32,32,3))
y_train_augmented = np.empty(0,dtype='uint8')

if not os.path.isfile(checkpoint2_file) or checkpoint2_force:
    print('Augmenting Image Data...')
    for i in tqdm(range(n_classes)):
        index = [y_train==i]
        images_for_i_class = X_train[y_train==i]
        y_i_class = y_train[y_train==i]
        
        X_train_augmented_i = np.copy(images_for_i_class)
        y_train_augmented_i = np.copy(y_i_class)
        
        for X,y in datagen.flow(images_for_i_class, y_i_class, batch_size=len(y_i_class), seed=9345+i*37):            
            X_train_augmented_i = np.append(X_train_augmented_i, X, axis=0)
            y_train_augmented_i = np.append(y_train_augmented_i, y, axis=0)
            
            if len(X_train_augmented_i) >= total_image_per_class:
                break

        X_train_augmented = np.append(X_train_augmented, X_train_augmented_i[:5000], axis=0)
        y_train_augmented = np.append(y_train_augmented, y_train_augmented_i[:5000], axis=0)              
else:
    print('Pickle file already created. Set checkpoint2_force to force new file.')

X_train_augmented, y_train_augmented = shuffle(X_train_augmented, y_train_augmented, random_state=9345)

# Storing for checkpoint2
X_train_augmented = X_train_augmented.astype('float32')
checkpoint2_var['X_train_augmented'] = X_train_augmented
checkpoint2_var['y_train_augmented'] = y_train_augmented
  0%|          | 0/43 [00:00<?, ?it/s]
Augmenting Image Data...
100%|██████████| 43/43 [02:43<00:00,  5.56s/it]
In [7]:
checkpoint2_var.keys()
Out[7]:
dict_keys(['X_val', 'X_train_augmented', 'y_test', 'y_train_augmented', 'y_train', 'X_test', 'X_train', 'y_val'])
In [8]:
### Storing Varibales for Checkpoint2

import joblib
joblib.dump(checkpoint2_var, checkpoint2_file)
Out[8]:
['data/checkpoint2']

Question 2

Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?

Answer:
The provided testing data is kept untouched, and the final model related matrix would be reported using this unseen data.

The provided training data had been divided as follow:

  • Training Data: 70% of original training data had been augmented so as to obtain 5000 samples per class. The augmented sample includes normalized original data, and newly generated data (by randomly introducing rotation, x-translation, y-translation, sheer factor and zoom factor to the original data).
  • Validation Data: 21% of original training data. This would be used for checking model performance while training.
  • Test Data: (different from provided test data) This would be used to fine tune the model, and check how the model performs on unseen data before running the model on provided test data set.

------------------ Checkpoint 2 ------------------

In [1]:
import pickle


## Checkpoint1 Variables
checkpoint1_file = 'data/checkpoint1'
with open(checkpoint1_file, 'rb') as pfile:
    pickle_data = pickle.load(pfile)
    for key in pickle_data:
        exec(key + "= pickle_data['" + str(key) + "']")
    del pickle_data

## Checkpoint2 Variables
import joblib
checkpoint2_file = 'data/checkpoint2'
pickle_data = joblib.load(checkpoint2_file)
for key in pickle_data:
    exec(key + "= pickle_data['" + str(key) + "']")
del pickle_data

Architecture

In [2]:
### Define your architecture here.
### Feel free to use as many code cells as needed.

import tensorflow as tf
from tensorflow.contrib.layers import flatten

from tensorflow.contrib.layers import flatten

def conv2d(x, W, b, strides, padding):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=padding)
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k, padding):
    return tf.nn.max_pool(
        x,
        ksize=[1, k, k, 1],
        strides=[1, k, k, 1],
        padding=padding)


def LeNet(x):
    
    # Input = 32x32x3
    # Hyperparameters
    mu = 0
    sigma = 0.1
    
    #x_gray = tf.image.rgb_to_grayscale(x)
    
    ### Layer 1:
    # Convolutional. Output = 28x28x6
    # Pooling. Output = 14x14x6
    W1 = tf.Variable(tf.truncated_normal([5, 5, 3, 6], mean = mu, stddev = sigma), name='W1')
    b1 = tf.Variable(tf.zeros(6), name='b1')
    layer1 = conv2d(x, W1, b1, 1, 'VALID')
    pool1 = tf.nn.dropout(maxpool2d(layer1, 2, 'VALID'), keep_prob_conv)

    ### Layer 2:
    # Convolutional: Output = 10x10x16
    # Pooling. Output = 5x5x16
    W2 = tf.Variable(tf.truncated_normal([5, 5, 6, 16], mean = mu, stddev = sigma), name='W2')
    b2 = tf.Variable(tf.zeros(16), name='b2')
    layer2 = conv2d(pool1, W2, b2, 1, 'VALID')
    pool2 = tf.nn.dropout(maxpool2d(layer2, 2, 'VALID'), keep_prob_conv)
    
    ### Layer 3:
    # Fully Connected. Output = 120
    fc = flatten(pool2)
    W_fc1 = tf.Variable(tf.truncated_normal([400, 120], mean = mu, stddev = sigma), name='W_fc1')
    b_fc1 = tf.Variable(tf.zeros(120), name='b_fc1')
    fc1 = tf.nn.relu(tf.add(tf.matmul(fc, W_fc1), b_fc1))
    fc1_drop = tf.nn.dropout(fc1, keep_prob)

    ### Layer 4
    # Fully Connected. Output = 84
    W_fc2 = tf.Variable(tf.truncated_normal([120, 84], mean = mu, stddev = sigma), name='W_fc2')
    b_fc2 = tf.Variable(tf.zeros(84), name='b_fc2')
    fc2 = tf.nn.relu(tf.add(tf.matmul(fc1_drop, W_fc2), b_fc2))
    fc2_drop = tf.nn.dropout(fc2, keep_prob)
    

    ### Layer 5
    # Fully Connected. Output = n_classes
    W_fc3 = tf.Variable(tf.truncated_normal([84, 43], mean = mu, stddev = sigma), name='W_fc3')
    b_fc3 = tf.Variable(tf.zeros(43), name='b_fc3')
    logits = tf.add(tf.matmul(fc2_drop, W_fc3), b_fc3)
    
    return logits

Features and Labels

In [3]:
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
keep_prob_conv = tf.placeholder(tf.float32)
keep_prob = tf.placeholder(tf.float32)
one_hot_y = tf.one_hot(y, 43)

Loss Function and Learning Parameters

In [4]:
rate = 0.001
BATCH_SIZE = 128
EPOCHS = 30

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)

Model Evaluation

In [5]:
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y, keep_prob:1, keep_prob_conv:1})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples

Training Setup

In [6]:
### Train your model here.
### Feel free to use as many code cells as needed.
from sklearn.utils import shuffle
from tqdm import tqdm

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    #loader = tf.train.import_meta_graph('lenet_gray.meta')
    #loader.restore(sess, tf.train.latest_checkpoint('./'))
    
    num_examples = len(X_train_augmented)
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        print("EPOCH " + str(i+1))
        X_train, y_train = shuffle(X_train_augmented, y_train_augmented)
        for offset in tqdm(range(0, num_examples, BATCH_SIZE)):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y, keep_prob:1, keep_prob_conv:1})
            #break
         
        training_accuracy = evaluate(X_train_augmented, y_train_augmented)
        validation_accuracy = evaluate(X_val, y_val)
        print("Training Accuracy = {:.3f}".format(training_accuracy))
        print("Validation Accuracy = {:.3f}".format(validation_accuracy))
        print()
        #break
    
    import os
    try:
        saver
    except NameError:
        saver = tf.train.Saver()
    save_path='model1/model'
    if not os.path.isabs(save_path):
        save_path = os.path.abspath(os.path.join(os.getcwd(), save_path))
    saver.save(sess, save_path)
    print("Model saved")
    test_accuracy = evaluate(X_test, y_test)
    print("Test Accuracy = {:.3f}".format(test_accuracy))
Training...

EPOCH 1
100%|██████████| 1680/1680 [00:18<00:00, 91.09it/s]
Training Accuracy = 0.842
Validation Accuracy = 0.921

EPOCH 2
100%|██████████| 1680/1680 [00:18<00:00, 93.24it/s]
Training Accuracy = 0.907
Validation Accuracy = 0.956

EPOCH 3
100%|██████████| 1680/1680 [00:17<00:00, 93.53it/s]
Training Accuracy = 0.933
Validation Accuracy = 0.966

EPOCH 4
100%|██████████| 1680/1680 [00:18<00:00, 93.24it/s]
Training Accuracy = 0.950
Validation Accuracy = 0.974

EPOCH 5
100%|██████████| 1680/1680 [00:17<00:00, 93.50it/s]
Training Accuracy = 0.955
Validation Accuracy = 0.970

EPOCH 6
100%|██████████| 1680/1680 [00:17<00:00, 93.39it/s]
Training Accuracy = 0.966
Validation Accuracy = 0.976

EPOCH 7
100%|██████████| 1680/1680 [00:17<00:00, 93.58it/s]
Training Accuracy = 0.970
Validation Accuracy = 0.983

EPOCH 8
100%|██████████| 1680/1680 [00:17<00:00, 93.34it/s]
Training Accuracy = 0.977
Validation Accuracy = 0.983

EPOCH 9
100%|██████████| 1680/1680 [00:17<00:00, 93.67it/s]
Training Accuracy = 0.970
Validation Accuracy = 0.979

EPOCH 10
100%|██████████| 1680/1680 [00:18<00:00, 93.23it/s]
Training Accuracy = 0.981
Validation Accuracy = 0.987

EPOCH 11
100%|██████████| 1680/1680 [00:18<00:00, 93.18it/s]
Training Accuracy = 0.982
Validation Accuracy = 0.989

EPOCH 12
100%|██████████| 1680/1680 [00:18<00:00, 93.30it/s]
Training Accuracy = 0.981
Validation Accuracy = 0.987

EPOCH 13
100%|██████████| 1680/1680 [00:17<00:00, 93.58it/s]
Training Accuracy = 0.985
Validation Accuracy = 0.988

EPOCH 14
100%|██████████| 1680/1680 [00:18<00:00, 93.21it/s]
Training Accuracy = 0.989
Validation Accuracy = 0.989

EPOCH 15
100%|██████████| 1680/1680 [00:18<00:00, 93.11it/s]
Training Accuracy = 0.988
Validation Accuracy = 0.987

EPOCH 16
100%|██████████| 1680/1680 [00:17<00:00, 93.48it/s]
Training Accuracy = 0.985
Validation Accuracy = 0.984

EPOCH 17
100%|██████████| 1680/1680 [00:17<00:00, 93.47it/s]
Training Accuracy = 0.989
Validation Accuracy = 0.987

EPOCH 18
100%|██████████| 1680/1680 [00:18<00:00, 92.14it/s]
Training Accuracy = 0.990
Validation Accuracy = 0.991

EPOCH 19
100%|██████████| 1680/1680 [00:18<00:00, 93.01it/s]
Training Accuracy = 0.989
Validation Accuracy = 0.990

EPOCH 20
100%|██████████| 1680/1680 [00:17<00:00, 93.39it/s]
Training Accuracy = 0.991
Validation Accuracy = 0.990

EPOCH 21
100%|██████████| 1680/1680 [00:17<00:00, 93.65it/s]
Training Accuracy = 0.990
Validation Accuracy = 0.989

EPOCH 22
100%|██████████| 1680/1680 [00:18<00:00, 93.14it/s]
Training Accuracy = 0.992
Validation Accuracy = 0.991

EPOCH 23
100%|██████████| 1680/1680 [00:18<00:00, 93.31it/s]
Training Accuracy = 0.993
Validation Accuracy = 0.987

EPOCH 24
100%|██████████| 1680/1680 [00:18<00:00, 93.31it/s]
Training Accuracy = 0.991
Validation Accuracy = 0.988

EPOCH 25
100%|██████████| 1680/1680 [00:17<00:00, 93.55it/s]
Training Accuracy = 0.994
Validation Accuracy = 0.991

EPOCH 26
100%|██████████| 1680/1680 [00:18<00:00, 93.24it/s]
Training Accuracy = 0.991
Validation Accuracy = 0.989

EPOCH 27
100%|██████████| 1680/1680 [00:17<00:00, 93.58it/s]
Training Accuracy = 0.991
Validation Accuracy = 0.989

EPOCH 28
100%|██████████| 1680/1680 [00:18<00:00, 93.19it/s]
Training Accuracy = 0.988
Validation Accuracy = 0.987

EPOCH 29
100%|██████████| 1680/1680 [00:17<00:00, 93.57it/s]
Training Accuracy = 0.991
Validation Accuracy = 0.987

EPOCH 30
100%|██████████| 1680/1680 [00:18<00:00, 93.32it/s]
Training Accuracy = 0.994
Validation Accuracy = 0.991

Model saved
Test Accuracy = 0.990

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

Modified LeNet Architecture
Input (32x32x3) -> Convolutional 5x5x6 -> ReLu -> MaxPool (stride of 2) -> Convolutional 5x5x16 -> ReLu -> MaxPool (stride of 2) -> Flatten (400) -> Fully Connected (120 units) -> Fully Connected (84 units) -> Fully Connected (43 units, output)

Calculating Model Accuracy on Provided Test Data: 95.2%

In [9]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('./model1/'))

    test_accuracy = evaluate(X_test_provided_normalized, y_test_provided)
    print("Test Accuracy = {:.3f}".format(test_accuracy))
Test Accuracy = 0.952

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

  • AdamOptimizer for managing learning rate
  • Batch Size: 128
  • Epochs: 30
  • Variables initialized with normal distribution (mean=0, std dev=0.1)
  • Biases initialized with zeros

Question 5

What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.

Answer:

  1. Preproceesed data by applying adaptive histogram equalization to make training immune to varying contrast and illumination.
  2. Augmented training data by randomly introducing rotation, horizontal and vertical shift, and scaling.
  3. Trained with LeNet Architecture. Dropouts were not introduced as the model did not overfit.
  4. Decided batch size and epoch by using generated validation and test set

Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [10]:
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
In [11]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
In [12]:
import os

fig = plt.figure(figsize=(32,32), tight_layout={'h_pad':4})
i = 0
for file in os.listdir('data'):
    if '.jpg' in file:
        ax = fig.add_subplot(4,4,i+1) 
        img = mpimg.imread('data/' + file)
        ax.imshow(img)
        i+=1
plt.show()
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/matplotlib/figure.py:1742: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
  warnings.warn("This figure includes Axes that are not "

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.

Answer:
The images were taken from this PDF. As the images are of Germany Traffic Sign itself, it should not be difficult for model to predict.

In [13]:
### Run the predictions here.
### Feel free to use as many code cells as needed.
In [14]:
import os, sys
import numpy as np
from PIL import Image
import warnings
from skimage import exposure

### Histogram Equilization
def normalized_image(image_data):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return np.array([exposure.equalize_adapthist(image, clip_limit=0.03) for image in image_data])

size = 32, 32
logit = None

with tf.Session() as sess:
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('model1/'))
    sess = tf.get_default_session()
    
    
    fig = plt.figure(figsize=(32,32))
    i = 0
    for file in os.listdir('data'):
        if '.jpg' in file:
            im = Image.open('data/' + file)
            im.thumbnail(size, Image.ANTIALIAS)
            im_normalized = normalized_image(np.array([np.array(im)]))
            logit = sess.run(tf.argmax(logits, 1), feed_dict={x:im_normalized, keep_prob:1.0, keep_prob_conv:1})
            ax = fig.add_subplot(4,2,i+1) 
            img = mpimg.imread('data/' + file)
            ax.imshow(img)
            ax.set_title(label_dict[logit[0]])
            i+=1
    plt.show()

Question 7

Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.

NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.

Answer:
The model is able to predict correctly on newly provided images with accuracy of 100%.

In [16]:
### Visualize the softmax probabilities here.
### Feel free to use as many code cells as needed.
In [24]:
import itertools
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
In [25]:
y_pred = None
with tf.Session() as sess:
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('model1/'))
    sess = tf.get_default_session()
    y_pred = sess.run(tf.argmax(logits, 1), feed_dict={x:X_test_provided_normalized, keep_prob:1.0, keep_prob_conv:1}) 
In [126]:
## Finding classes which where wrongly predicted
classes = [i for i in range(n_classes)]
plt.figure(figsize=(9,9))
plot_confusion_matrix(confusion_matrix(y_test_provided, y_pred), classes, normalize=False)
Confusion matrix, without normalization

From confusion matrix it can be seen that class 11 has been wrongly predicted as class 30 in many cases.

In [34]:
label_dict[11]
Out[34]:
'Right-of-way at the next intersection'
In [35]:
label_dict[30]
Out[35]:
'Beware of ice/snow'
In [42]:
class_11 = np.where(y_test_provided==11)
y_class_11_pred = y_pred[class_11]
X_class_11_samples = X_test_provided_normalized[class_11]

class_11_predicted_as_class_30 = np.where(y_class_11_pred==30)
X_class_11_samples_predicted_as_class_30 = X_class_11_samples[class_11_predicted_as_class_30]
In [69]:
## Finding top-5 classes for incorrect predictions

y_top_5 = None
with tf.Session() as sess:
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('model1/'))
    sess = tf.get_default_session()
    
    for X_incorrect in X_class_11_samples_predicted_as_class_30:
        y_top_5 = sess.run(tf.nn.top_k(logits, k=3), feed_dict={x:[X_incorrect], keep_prob:1.0, keep_prob_conv:1})
        print(y_top_5.indices)
[[30 11 21]]
[[30 11  3]]
[[30 11 28]]
[[30 11 21]]
[[30 11 28]]
[[30 11 28]]
[[30 11  3]]
[[30 11  3]]
[[30 11 18]]
[[30 11 28]]
[[30 11 28]]
[[30 11 25]]
[[30 11 23]]
[[30 18 11]]
[[30 11 29]]
[[30 11 18]]
[[30 11  3]]
[[30 11 28]]
[[30 11 28]]
[[30 11 12]]
[[30 11 21]]
[[30 11 21]]
[[30 11 25]]
[[30  3 11]]
[[30 11  1]]
[[30  3 11]]
[[30 11 42]]
[[30 11 28]]
[[30 11 28]]
[[30 11 28]]
[[30 11 23]]
[[30 11 28]]
[[30 11 23]]

The top 3 prediction contains class 11 in all the cases.

Repeating for all the Incorrect Predictions

In [70]:
X_test_incorrect_pred = X_test_provided_normalized[y_pred != y_test_provided]
y_true_incorrect_pred = y_test_provided[y_pred != y_test_provided]

in_top_3 = None
in_top_5 = None
with tf.Session() as sess:
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('model1/'))
    sess = tf.get_default_session()
    
    in_top_3 = sess.run(tf.nn.in_top_k(logits, y_true_incorrect_pred, 3), feed_dict={x:X_test_incorrect_pred, keep_prob:1.0, keep_prob_conv:1})
    in_top_5 = sess.run(tf.nn.in_top_k(logits, y_true_incorrect_pred, 5), feed_dict={x:X_test_incorrect_pred, keep_prob:1.0, keep_prob_conv:1})
In [71]:
## Total Incorrect Count
len(X_test_incorrect_pred)
Out[71]:
604
In [72]:
## Incorrect Prediction but correctly predicted in Top 3
len(in_top_3[in_top_3==True])
Out[72]:
450
In [73]:
## Incorrect Prediction but correctly predicted in Top 5
len(in_top_5[in_top_5==True])
Out[73]:
536

Calculating Top n Accuracy

In [74]:
with tf.Session() as sess:
    loader = tf.train.import_meta_graph('model1/model.meta')
    loader.restore(sess, tf.train.latest_checkpoint('model1/'))
    sess = tf.get_default_session()
    
    in_top_3 = sess.run(tf.nn.in_top_k(logits, y_test_provided, 3), feed_dict={x:X_test_provided_normalized, keep_prob:1.0, keep_prob_conv:1})
    in_top_5 = sess.run(tf.nn.in_top_k(logits, y_test_provided, 5), feed_dict={x:X_test_provided_normalized, keep_prob:1.0, keep_prob_conv:1})

Top 3 Accuracy

In [75]:
100.0 * len(in_top_3[in_top_3==True])/len(y_test_provided)
Out[75]:
98.78068091844814

Top 5 Accuracy

In [76]:
100.0 * len(in_top_5[in_top_5==True])/len(y_test_provided)
Out[76]:
99.46159936658749

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.

Take this numpy array as an example:

# (5, 6) array
a = np.array([[ 0.24879643,  0.07032244,  0.12641572,  0.34763842,  0.07893497,
         0.12789202],
       [ 0.28086119,  0.27569815,  0.08594638,  0.0178669 ,  0.18063401,
         0.15899337],
       [ 0.26076848,  0.23664738,  0.08020603,  0.07001922,  0.1134371 ,
         0.23892179],
       [ 0.11943333,  0.29198961,  0.02605103,  0.26234032,  0.1351348 ,
         0.16505091],
       [ 0.09561176,  0.34396535,  0.0643941 ,  0.16240774,  0.24206137,
         0.09155967]])

Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:

TopKV2(values=array([[ 0.34763842,  0.24879643,  0.12789202],
       [ 0.28086119,  0.27569815,  0.18063401],
       [ 0.26076848,  0.23892179,  0.23664738],
       [ 0.29198961,  0.26234032,  0.16505091],
       [ 0.34396535,  0.24206137,  0.16240774]]), indices=array([[3, 0, 5],
       [0, 1, 4],
       [0, 5, 1],
       [1, 3, 5],
       [1, 4, 3]], dtype=int32))

Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.

In [128]:
for i in range(n_classes):
    cm = confusion_matrix(y_test_provided, y_pred)
    cm_t = np.transpose(cm)
    total_true_count = np.sum(y_test_provided==i)
    total_true_pred = cm[i][i]
    precision = total_true_pred / np.sum(cm_t[i].ravel())
    cm[i][i] = 0
    maximum_misclassified_as = np.argmax(cm[i])
    percentage_classification_max_confusion = cm[i][maximum_misclassified_as] / total_true_count * 100
    
    print("Class %s:" % label_dict[i])
    print("  Accuracy = {:.2f}%".format(total_true_pred/total_true_count*100))
    print("  Precision = {:.2f}%".format(precision*100))
    
    if (cm[i][maximum_misclassified_as] != 0):
        print("  Maximum Misclassified as: %s" % label_dict[maximum_misclassified_as])
        print("  Misclassification Percentage for above class: {:.2f}%".format(percentage_classification_max_confusion))
    
    print()
Class Speed limit (20km/h):
  Accuracy = 96.67%
  Precision = 93.55%
  Maximum Misclassified as: Speed limit (30km/h)
  Misclassification Percentage for above class: 3.33%

Class Speed limit (30km/h):
  Accuracy = 98.61%
  Precision = 94.67%
  Maximum Misclassified as: Speed limit (80km/h)
  Misclassification Percentage for above class: 0.56%

Class Speed limit (50km/h):
  Accuracy = 97.60%
  Precision = 97.73%
  Maximum Misclassified as: Speed limit (30km/h)
  Misclassification Percentage for above class: 1.33%

Class Speed limit (60km/h):
  Accuracy = 92.44%
  Precision = 94.33%
  Maximum Misclassified as: Speed limit (80km/h)
  Misclassification Percentage for above class: 5.56%

Class Speed limit (70km/h):
  Accuracy = 95.30%
  Precision = 98.90%
  Maximum Misclassified as: Speed limit (30km/h)
  Misclassification Percentage for above class: 1.67%

Class Speed limit (80km/h):
  Accuracy = 92.38%
  Precision = 90.23%
  Maximum Misclassified as: Speed limit (60km/h)
  Misclassification Percentage for above class: 1.90%

Class End of speed limit (80km/h):
  Accuracy = 83.33%
  Precision = 96.90%
  Maximum Misclassified as: Speed limit (80km/h)
  Misclassification Percentage for above class: 7.33%

Class Speed limit (100km/h):
  Accuracy = 94.89%
  Precision = 96.61%
  Maximum Misclassified as: Speed limit (120km/h)
  Misclassification Percentage for above class: 2.89%

Class Speed limit (120km/h):
  Accuracy = 96.89%
  Precision = 93.76%
  Maximum Misclassified as: Speed limit (60km/h)
  Misclassification Percentage for above class: 0.89%

Class No passing:
  Accuracy = 98.12%
  Precision = 99.16%
  Maximum Misclassified as: No passing for vehicles over 3.5 metric tons
  Misclassification Percentage for above class: 0.83%

Class No passing for vehicles over 3.5 metric tons:
  Accuracy = 96.97%
  Precision = 98.46%
  Maximum Misclassified as: Speed limit (80km/h)
  Misclassification Percentage for above class: 0.91%

Class Right-of-way at the next intersection:
  Accuracy = 86.67%
  Precision = 95.54%
  Maximum Misclassified as: Beware of ice/snow
  Misclassification Percentage for above class: 7.86%

Class Priority road:
  Accuracy = 97.97%
  Precision = 99.56%
  Maximum Misclassified as: Roundabout mandatory
  Misclassification Percentage for above class: 0.72%

Class Yield:
  Accuracy = 98.61%
  Precision = 99.58%
  Maximum Misclassified as: No vehicles
  Misclassification Percentage for above class: 0.83%

Class Stop:
  Accuracy = 99.63%
  Precision = 97.46%
  Maximum Misclassified as: Priority road
  Misclassification Percentage for above class: 0.37%

Class No vehicles:
  Accuracy = 96.19%
  Precision = 92.24%
  Maximum Misclassified as: Speed limit (30km/h)
  Misclassification Percentage for above class: 2.86%

Class Vehicles over 3.5 metric tons prohibited:
  Accuracy = 100.00%
  Precision = 94.94%

Class No entry:
  Accuracy = 98.61%
  Precision = 100.00%
  Maximum Misclassified as: Stop
  Misclassification Percentage for above class: 1.39%

Class General caution:
  Accuracy = 91.28%
  Precision = 94.18%
  Maximum Misclassified as: Traffic signals
  Misclassification Percentage for above class: 2.82%

Class Dangerous curve to the left:
  Accuracy = 90.00%
  Precision = 98.18%
  Maximum Misclassified as: Slippery road
  Misclassification Percentage for above class: 8.33%

Class Dangerous curve to the right:
  Accuracy = 97.78%
  Precision = 72.73%
  Maximum Misclassified as: Bicycles crossing
  Misclassification Percentage for above class: 1.11%

Class Double curve:
  Accuracy = 65.56%
  Precision = 80.82%
  Maximum Misclassified as: Traffic signals
  Misclassification Percentage for above class: 16.67%

Class Bumpy road:
  Accuracy = 87.50%
  Precision = 97.22%
  Maximum Misclassified as: Road work
  Misclassification Percentage for above class: 7.50%

Class Slippery road:
  Accuracy = 94.67%
  Precision = 92.21%
  Maximum Misclassified as: Beware of ice/snow
  Misclassification Percentage for above class: 2.00%

Class Road narrows on the right:
  Accuracy = 93.33%
  Precision = 94.38%
  Maximum Misclassified as: Traffic signals
  Misclassification Percentage for above class: 2.22%

Class Road work:
  Accuracy = 94.38%
  Precision = 95.37%
  Maximum Misclassified as: General caution
  Misclassification Percentage for above class: 2.71%

Class Traffic signals:
  Accuracy = 95.56%
  Precision = 82.30%
  Maximum Misclassified as: Road work
  Misclassification Percentage for above class: 3.33%

Class Pedestrians:
  Accuracy = 80.00%
  Precision = 81.36%
  Maximum Misclassified as: Bicycles crossing
  Misclassification Percentage for above class: 20.00%

Class Children crossing:
  Accuracy = 97.33%
  Precision = 94.19%
  Maximum Misclassified as: Bicycles crossing
  Misclassification Percentage for above class: 2.00%

Class Bicycles crossing:
  Accuracy = 100.00%
  Precision = 77.59%

Class Beware of ice/snow:
  Accuracy = 80.67%
  Precision = 74.23%
  Maximum Misclassified as: Dangerous curve to the right
  Misclassification Percentage for above class: 6.67%

Class Wild animals crossing:
  Accuracy = 95.19%
  Precision = 98.47%
  Maximum Misclassified as: Dangerous curve to the right
  Misclassification Percentage for above class: 2.96%

Class End of all speed and passing limits:
  Accuracy = 100.00%
  Precision = 95.24%

Class Turn right ahead:
  Accuracy = 99.52%
  Precision = 98.58%
  Maximum Misclassified as: Ahead only
  Misclassification Percentage for above class: 0.48%

Class Turn left ahead:
  Accuracy = 100.00%
  Precision = 92.31%

Class Ahead only:
  Accuracy = 95.13%
  Precision = 99.46%
  Maximum Misclassified as: Turn left ahead
  Misclassification Percentage for above class: 1.54%

Class Go straight or right:
  Accuracy = 99.17%
  Precision = 90.84%
  Maximum Misclassified as: Roundabout mandatory
  Misclassification Percentage for above class: 0.83%

Class Go straight or left:
  Accuracy = 98.33%
  Precision = 81.94%
  Maximum Misclassified as: Turn right ahead
  Misclassification Percentage for above class: 1.67%

Class Keep right:
  Accuracy = 94.49%
  Precision = 99.85%
  Maximum Misclassified as: Go straight or left
  Misclassification Percentage for above class: 1.30%

Class Keep left:
  Accuracy = 98.89%
  Precision = 100.00%
  Maximum Misclassified as: Turn right ahead
  Misclassification Percentage for above class: 1.11%

Class Roundabout mandatory:
  Accuracy = 95.56%
  Precision = 81.90%
  Maximum Misclassified as: Go straight or right
  Misclassification Percentage for above class: 2.22%

Class End of no passing:
  Accuracy = 85.00%
  Precision = 87.93%
  Maximum Misclassified as: End of no passing by vehicles over 3.5 metric tons
  Misclassification Percentage for above class: 13.33%

Class End of no passing by vehicles over 3.5 metric tons:
  Accuracy = 92.22%
  Precision = 80.58%
  Maximum Misclassified as: End of no passing
  Misclassification Percentage for above class: 6.67%

Answer:

The model has top-5 accuracy of 99.46%, and top-3 accuracy of 98.78%. Also, the model has accuracy and precision greater than 90% for most of the classes. The confusion matrix for the model is drawn below.

The model misclassifies class like:

  • End of no passing by vehicles over 3.5 metric tons and End of no passing
  • Children Crossing and Bicycle Crossing and Pedestrian Crossing
  • End of speed limit (80km/h) and Speed limit (80km/h)
In [151]:
classes = [i for i in range(n_classes)]
plt.figure(figsize=(9,9))
plot_confusion_matrix(confusion_matrix(y_test_provided, y_pred), classes, normalize=False)
Confusion matrix, without normalization

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.